Deflakes Primary COB growth with inactive replica #2715

sarthakaggarwal97 · 2025-10-09T21:51:02Z

Resolves #2696

The primary issue was that with sanitizer mode, the test needed more time for primary’s replication buffers grow beyond 2 × backlog_size. Increasing the threshold of repl-timeout to 30s, ensures that the inactive replica is not disconnected while the full sync is proceeding. rdb-key-save-delay controls or throttles the data written to the client output buffer, and in this case, we are deterministically able to perform the fullsync within 10s (10000 keys * 0.001s).

Increasing the wait_for_condition gives it enough retries to verify that mem_total_replication_buffers reaches the required 2 × backlog_size.

The test is passing for past 7 consecutive iterations for test-sanitizer-address in my daily runs. I amended log to show the current backlog if it doesn't reach 2x.

Signed-off-by: Sarthak Aggarwal <[email protected]>

codecov · 2025-10-09T22:15:21Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.62%. Comparing base (155b0bb) to head (db6e772).
⚠️ Report is 13 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2715      +/-   ##
============================================
+ Coverage     72.40%   72.62%   +0.21%     
============================================
  Files           128      128              
  Lines         71273    71273              
============================================
+ Hits          51606    51759     +153     
+ Misses        19667    19514     -153

see 19 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

zuiderkwast

LGTM

enjoy-binbin

can you give some clues? is that how (or where) did you determine the timeout and why increasing rdb-key-save-delay helped? We can add more info in the top comment and then merge it into. The analysis info is also very useful in these timing issues

sarthakaggarwal97 · 2025-10-15T22:14:49Z

@enjoy-binbin thank you for taking a look, I shared my thought process in the PR description, please let me know if it doesn't makes sense!

Resolves valkey-io#2696 The primary issue was that with sanitizer mode, the test needed more time for primary’s replication buffers grow beyond `2 × backlog_size`. Increasing the threshold of `repl-timeout` to 30s, ensures that the inactive replica is not disconnected while the full sync is proceeding. `rdb-key-save-delay` controls or throttles the data written to the client output buffer, and in this case, we are deterministically able to perform the fullsync within 10s (10000 keys * 0.001s). Increasing the `wait_for_condition` gives it enough retries to verify that `mem_total_replication_buffers` reaches the required `2 × backlog_size`. Signed-off-by: Sarthak Aggarwal <[email protected]>

sarthakaggarwal97 requested review from naglera and ranshid October 9, 2025 21:51

github-actions bot assigned sarthakaggarwal97 Oct 9, 2025

allows more time to grow backlog

db6e772

Signed-off-by: Sarthak Aggarwal <[email protected]>

sarthakaggarwal97 force-pushed the cob-growth-test-fix branch from f9f94d9 to db6e772 Compare October 9, 2025 21:56

sarthakaggarwal97 requested review from enjoy-binbin and zuiderkwast October 13, 2025 17:23

zuiderkwast approved these changes Oct 14, 2025

View reviewed changes

enjoy-binbin reviewed Oct 15, 2025

View reviewed changes

enjoy-binbin approved these changes Oct 16, 2025

View reviewed changes

enjoy-binbin merged commit 981b8fe into valkey-io:unstable Oct 16, 2025
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deflakes Primary COB growth with inactive replica #2715

Deflakes Primary COB growth with inactive replica #2715

Uh oh!

sarthakaggarwal97 commented Oct 9, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

zuiderkwast left a comment

Uh oh!

enjoy-binbin left a comment

Uh oh!

sarthakaggarwal97 commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Deflakes Primary COB growth with inactive replica #2715

Deflakes Primary COB growth with inactive replica #2715

Uh oh!

Conversation

sarthakaggarwal97 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

enjoy-binbin left a comment

Choose a reason for hiding this comment

Uh oh!

sarthakaggarwal97 commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sarthakaggarwal97 commented Oct 9, 2025 •

edited

Loading

codecov bot commented Oct 9, 2025 •

edited

Loading